The space complexity of recognizing well-parenthesized 
expressions in the streaming model: 
the Index function revisited 



Rahul Jain* Ashwin Nayak"!" 

July 20, 2011 



Abstract 

We show an n{^/n/T) lower bound for the space required by any unidirectional constant- 
error randomized T-pass streaming algorithm that recognizes whether an expression over two 
types of parenthesis is well-parenthesized. This proves a conjecture due to Magniez, Mathieu, 
and Nayak (2009) and rigorously establishes that bi-directional streams are exponentially more 
efficient in space usage as compared with unidirectional ones. 

We obtain the lower bound by analyzing the information that is necessarily revealed by the 
players about their respective inputs in a two-party communication protocol for a variant of 
the Index function, namely Augmented Index. We show that in any communication protocol 
that computes this function correctly with constant error on the uniform distribution (a "hard" 
distribution), either Alice reveals fl{n) information about her n-bit input, or Bob reveals f2(l) 
information about his (logn)-bit input, even when the inputs are drawn from an "easy" distri- 
bution, the uniform distribution over inputs which evaluate to 0. 

The information cost trade-off is obtained by a novel application of the conceptually sim- 
ple and familiar ideas such as average encoding and the cut-and-paste property of randomized 
protocols. We further demonstrate the effectiveness of these techniques by extending the result 
to quantum protocols. We show that quantum protocols that compute the Augmented Index 
function correctly with constant error on the uniform distribution, either Alice reveals D,(n/t) 
information about her n-bit input, or Bob reveals D,{l/t) information about his (logn)-bit input, 
where t is the number of messages in the protocol, even when the inputs are drawn from the 
abovementioned easy distribution. 
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1 Introduction 



Streaming algorithms are designed to process massive input data, which cannot fit entirely in 
computer memory. Random access to such input is prohibitive, so ideally we would like to process 
it with a single sequential scan. Furthermore, during the computation, the algorithms are compelled 
to use space that is much smaller than the length of the input. Formally, streaming algorithms 
access the input sequentially, one symbol at a time, a small number of times (called passes), while 
attempting to solve some information processing task using as little space (and time) as possible. 
We refer the reader to the text |25] for a more thorough introduction to this topic. 

One-pass streaming algorithms that use constant space and time recognize precisely the set of 
regular languages. It is thus natural to ask what the complexity of languages higher up in the 
Chomsky hierarchy is in the streaming model. In this work, we focus on a concrete such problem, 
that of checking whether an expression with different types of parenthesis is well-formed. The 
problem is formalized through the study of the language Dyck(2), which consists of all well- 
parenthesized expressions over two types of parenthesis, denoted below by a, a and 6, b, with the 
bar indicating a closing parenthesis. 

Definition 1.1 Dyck(2) is the language over alphabet S = |a,a, 6} defined recursively as 

Dyck(2) = e + (a • Dyck(2) • a 6 • Dyck(2) • 6) • Dyck(2) , 

where e is the empty string, '■' indicates concatenation of strings (or subsets thereof) and '+' 
denotes set union. 

This deceptively simple language is in a certain precise sense complete for the class of context-free 
languages [10], and is implicit in a myriad of information processing tasks. 

There is a straightforward algorithm that recognizes Dyck(2) with logarithmic space, as we may 
run through all possible heights, and check parentheses at the same height. This scheme does not 
seem to easily translate to streaming algorithms, even with a small number of passes over the input. 
In fact, by appealing to the communication complexity of the equality function, we can deduce that 
any deterministic streaming algorithm for Dyck(2) that makes T passes over the input requires 
space n{n/T) on instances of length n. Another route is suggested by a small-space algorithm for 
the word problem in the free group with 2 generators. This is a relaxation of Dyck(2) in which local 
simplifications pp = e are allowed in addition to pp = e for every type of parenthesis {p,p). There 
is a logarithmic space algorithm for solving the word problem |22j that can easily be massaged into 
a one-pass streaming algorithm with poly logarithmic space. Again, this algorithm does not extend 
to Dyck(2). 

We rigorously establish the impossibility of recognizing Dyck(2) with logarithmic space with a 
small number of passes in the streaming model. 

Theorem 1.1 Any unidirectional randomized T-pass streaming algorithm that recognizes length n 
instances o/Dyck(2) with a constant probability of error uses space VL{y/n/T). 

A more precise statement of this theorem is presented as Corollary 13.21 later in this article. (Simi- 
larly, the theorems we state below are made more precise in later sections.) 

Dyck(2) was first studied in the context of the streaming model by Magniez, Mathieu, and 
Nayak ^23j, spurred by its practical relevance, e.g., its relationship to the processing of large XML 
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files, and the connection between formal language theory and complexity in the context of pro- 
cessing massive data. They overcome the apparent difficulties described above and present two 
sublinear space randomized streaming algorithms for Dyck(2). The first makes one pass over the 
input, recognizes well-parenthesized expressions with space 0{\/n logn) bits, and has polynomially 
small probability of error. Moreover, they establish an optimal space lower bound for one-pass 
algorithms with polynomially small error. They prove that any one-pass algorithm that makes 
error at most 1/nlogn uses space ^}{^/rlTogn) . 

Perhaps more surprisingly, Magniez et al. show that the demand on space shrinks drastically when 
the algorithm is allowed another pass over the input. The second algorithm makes two passes 
over the input, uses only O(log^n) space, and has polynomially small probability of error. A 
curious property of the second algorithm is that it makes the second pass in reverse order, and 
this seems essential for its performance. An obvious question then is whether this is an artefact of 
the algorithm, or if we could achieve similar reduction in space usage by making multiple passes 
in the same direction. The logarithmic space algorithm for Dyck(2) mentioned above translates 
to streaming algorithms with a linear number of passes, and suggests the possibility of algorithms 
with fewer (but more than one) passes, that use sub-polynomial space. Nonetheless, Magniez et al. 
conjecture that a bound similar to that for the one-pass algorithms hold for multi-pass streaming 
algorithms if all passes are made in the same direction. Theorem 11.11 proves the above conjecture 
and confirms the intuition that the ability to scan the input in either direction gives streaming 
algorithms additional computational firepower. The bound we get for one-pass algorithms is a 
factor of "v/logn better than the one in Ref. j23j for constant error probability, but falls short of 
optimal by the same factor for polynomially small error. 

Theorem 11.11 is a consequence of a lower bound that we establish for the information cost of two- 
party communication protocols for a variant of the Index problem. In this variant, the player 
holding the index also receives a portion of the other party's input. More formally, one party, 
Alice, has an n-bit string x, and the other party, Bob, has an integer A; G [n], the prefix x[l, k — 1] 
of X, and a bit 6 G {0, 1}. The goal is to compute the function fn{x, {k, x[l, k — l],b)) = Xk ® b, 
i.e., to determine whether b = x^ or not. This problem was studied in the one-way communication 
model as "serial encoding" [2l[26], and as "Augmented Index" [12^117) and "Mountain problem" [23] 
later works. Informally speaking, we show that in any communication protocol that computes /„ 
correctly with constant error on the uniform distribution /x (a "hard distribution"), either Alice 
reveals 0(n) information about her input x, or Bob reveals $^(1) information about his input k, even 
when the inputs are drawn from an "easy distribution" (^uq, the uniform distribution over /~^(0)). 
We formally define the notion of information cost (IC^(II), IC^(n)) for a protocol 11 for the two 
players Alice (A) and Bob (B) with respect to the distribution A in Section [2.21 and show: 

Theorem 1.2 In any two-party randomized communication protocol 11 for the Augmented Index 
function fn that makes constant error at most e £ [0, 1/4) on the uniform distribution ji over inputs, 
either IC^^^ili) G ^{n) orlC%^{Ii) G 17(1). 

The connection between the Augmented Index function /„ and streaming algorithms for Dyck(2) 
was charted by Magniez et al. They map a streaming algorithm for Dyck(2) that uses space s to a 
multi-party communication protocol in which the messages are each of the same length s, and then 
bound s from below for protocols resulting from one-pass algorithms. The communication bound 
is derived using the information cost approach (see, for example, Refs. [U [281 13 113 El)) which 
reduces the task to bounding from below the information cost of Augmented Index. 

A notion of information cost for Index was studied previously by Jain, Radhakrishnan, and Sen |15] 
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in the context of privacy in communication. This notion differs from the one we study in two crucial 
respects. First, it is defined in terms of the hard distribution for the problem (uniform over all 
inputs). Second, the hard distribution is a product distribution. The techniques they develop 
seem not to be directly relevant to the problem at hand, as we deal with an easy and non-product 
distribution. 

We devise a new method for analyzing the information cost of /„ to arrive at Theorem 11.21 The 
proof we present shows how the conceptually simple and familiar ideas such as average encoding 
and the cut-and-paste property of randomized protocols may be brought to bear on Augmented 
Index to derive the optimal (up to constant factors) information cost trade-off. We note that a 
stronger trade-off was established by Magniez, Mathieu, and Nayak |23) for two-message protocols 
that start with Alice, and make polynomially small error. They show that either Alice reveals Q{n) 
information about x, or Bob reveals r2(logn) information about k in such protocols. This cannot 
be reproduced with our techniques, as we do not restrict ourselves to this special form of protocol. 
Indeed, for every / G {1,2,... , [log2?T-J}, there is a deterministic protocol for /„ in which Bob 
sends / bits of k, and Alice responds with n/2' bits. 

In independent work, concurrent with ours, Chakrabarti, Cormode, Kondapally, and McGregor [7] 
derive a similar information cost trade-off for /„. Their motivation is identical to ours — to study 
the space required by unidirectional multi-pass streaming algorithms for Dyck(2), and they present 
a similar space lower bound for such algorithms. While some of the basic tools from information 
theory that they ultimately employ (e.g., the Chain Rule for mutual information and the Pinskert 
Inequality) are equivalent to ours, they take a different, rather technical, route to these tools. 
The first version of our article |13j and that of Chakrabarti et al. [8] contained trade-offs that were 
weaker, albeit in different respects. After learning about each other's works, both groups strengthed 
our respective proofs to achieve qualitatively the same results. 

We demonstrate the power of the approach we take by extending it to quantum protocols for Aug- 
mented Index. Starting with appropriate notions of quantum information cost (QlC^(n), QlCf (11)) 
for a protocol 11 for Augmented Index, we arrive at the following trade-off. 

Theorem 1.3 In any two-party quantum communication protocol 11 (with read-only behaviour on 
inputs and no intermediate measurements) for the Augmented Index function fn that has t 
message exchanges and makes constant error at most e £ [0,1 /4) on the uniform distribution /i 
over inputs, either QlC^g(n) G 0(n/t) or QlC^^ill) E n{l/t). 

The quantum information cost trade-off involves a number of subtleties, such as quantifying infor- 
mation cost in the absence of a notion of a message transcript, one which also avoids any information 
leakage due to the non-product nature of the input distribution. The absence of an analogue to 
the Cut-and-Paste property introduces further complications. We circumvent the Cut-and-Paste 
property by adapting a hybrid argument due to Jain, Radhakrishnan, and Sen [14j that allows us 
to analyze quantum protocols one message at a time. These issues are discussed in detail in Section 
4.2. 

We are not aware of quantum protocols that beat the classical information bounds, and believe 
the dependence of the trade-off in Theorem 11.31 on the number of rounds t is a consequence of the 
proof technique. The proof of the connection between quantum streaming algorithms and quantum 
protocols for Augmented Index breaks down in the process of defining a suitable notion of 
quantum information cost. We leave the possible implications for space lower bounds for quantum 
streaming algorithms to future work. Finally, we remark that the approaches taken by Magniez et 
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al. [23] and Chakrabarti et al. [7] for showing information cost trade-off in classical protocols do 
not seem to generalize to quantum protocols. They are based on analyzing the input distribution 
conditioned on the message transcript, for which no suitable quantum analogue is known. 

Communication problems involving the Index function capture a number of phenomena in the 
theory of computing, both classical and quantum, in addition to playing a fundamental role in 
the area of communication complexity [21]. For instance, they have been used to analyze data 
structures [21], the size of finite automata [3] and formulae [IHj, the length of locally decodable 
codes [18], learnability of quantum states [1], and sketching complexity [1]. We believe that the 
more nuanced properties of the Index function such as the one we establish here be of fundamental 
importance, and be likely to find application in other contexts as well. 



2 Classical information cost of Augmented Index 

In this section we present the first result of this article. We summarize the notational conventions 
we follow and the background from classical information theory that we assume in Section 12.11 
Then we develop the lower bound for classical protocols for Augmented Index in Section 12. 2[ 

2.1 Information theory and communication complexity basics 

We reserve small case letters like x, k, m for bit-strings or integers, and capital letters like X, K, M 
for random variables over the corresponding sample spaces. We use the same symbol for a random 
variable and its distribution. As is standard, given jointly distributed random variables AB over a 
product sample space, A represents the marginal distribution over the first component. We often 
use as shorthand for the conditional distribution A\(B = b) when the second random variable B 
is clear from the context. For a string x E {0,1}", and integers i,j E [n] = {1,2, ...,n}, we let 
x[i,j] denote the substring of consecutive bits Xi - ■ ■ Xj. U j < i, the expression denotes the empty 
string. This notation extends to random variables over {0, 1}" in the obvious manner. When a 
sample z is drawn from distribution Z, we denote it as z Z. 

The £i-distance || A — between two random variables A, B over the same finite sample space S 
is given by 

\\A-B\\ = . 

(Recall that as per our notational convention A{i),B{i) denote the probabilities assigned to i E 5 
by A,B, respectively.) The Hellinger distance i){A , B) between the random variables is defined as 



f)(A, B) 



. 1/2 
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Hellinger distance is a metric, and is related to to ii distance in the following manner. 
Proposition 2.1 Let P,Q be distributions over the same sample space. Then 

fi(P, Qf < l\\P-Q\\ < V2i){P,Q) . 

The square of the Hellinger distance is jointly convex. 
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Proposition 2.2 Let Pi,Qi be distributions over the same sample space for each i E [n], and 
let (oj) be a probability distribution over [n]. Let P = X^ILi ^i^i^ '^'^^ Q — X^ILi ^iQi- Then 

n 

HP, Qf < ^*)' • 

i=l 

We rely on a number of standard results from information tiieory in this work. For a comprehensive 
introduction to the subject, we refer the reader to a text such as [llj. 

Let H(X) denote the Shannon entropy of the random variable X, and 1{X : Y) denote the mutual 
information between two random variables X, Y. We also use H(p) to denote the Binary entropy 
function when p € [0, 1]. 

The chain rule for mutual information states: 

Proposition 2.3 (Chain rule) Let ABC be jointly distributed random variables. Then 

1{AB:C) = l{A:C)+liB -.CIA) . 

The Average encoding theorem [20^ [T4] is a quantitative version of the intuition that two random 
variables that are only weakly correlated are nearly independent. Stated differently, the conditional 
distribution of one given the other is close to its marginal distribution, if their mutual information 
is sufficiently small. 

Proposition 2.4 (Average encoding theorem) Let AB be jointly distributed random variables. 
Then, 

Eb^B HA\b , Af < Kl{A : B) , 

where k is the constant 

We refer the reader to the text [21] for an introduction to the model of two-party communication 
protocols. We use the following Cut-and-Paste property of private-coin communication protocols 
(see, e.g., Ref. ^ Lemma 6.3]). 

Proposition 2.5 (Cut-and-Paste) Let Yi be a two-party private coin communication protocol. 
Let M{x, y) denote the random variable representing the message transcript in H when the first 
party has input x and the second party has input y. Then for all pairs of inputs {x,y) and {u,v), 

l)iMix,y) , M{u,v)) = l)iM{x,v) , M{u,y)) . 
2.2 The classical information cost lower bound 

The main theorem in this article may be viewed as a trade-off between information revealed by the 
two parties about their inputs while computing the Augmented Index function /„. We show that 
at least one of the parties necessarily reveals "a lot" of information even on an "easy distribution" if 
the protocol computes /„ with bounded error on a "hard distribution" . The notion of information 
on which we focus is known as "internal information" in the literature (see, e.g., Ref. [6j). 

Consider a randomized two-party communication protocol n which uses public randomness R, and 
may additionally use private randomness. Suppose that M is the message transcript of the protocol. 
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when the inputs X, Y to the two players, Ahcc and Bob, respectively, are sampled from the joint 
distribution A. The informMion cost of the protocol for Alice with respect to the distribution A 
is defined as IC^(n) 1{X : M \ YR). The information cost of the protocol for Bob is defined 
symmetrically as lCf{U) = I(Y : M \ XR). 

Recall that in the Augmented Index problem, one party, Alice, has an ra-bit string x, and the 
other party, Bob, has an integer k G [n], the prefix — 1] of x, and a bit 6 G {0, 1}. Their goal 

is to compute the function (A;, A; — 1],6)) = © 6, i.e., to determine whether 6 = or 
not, by engaging in a two-party communication protocol. 

Let (X, K, B) be random variables distributed according to /i, the uniform distribution over {0, 1}" x 
[n] X {0, 1}. Let denote the distribution conditioned upon B = Xk, i.e., when the inputs are 
chosen uniformly from the set of Os of fn- We are interested in the information cost of a pro- 
tocol n with public randmness R for Augmented Index under the distribution hq, for the two 
parties. Let M denote the entire message transcript under fi, and let denote the transcript 
under distribution ^q. Then the information cost of 11 is given by 10^^(11) = 1{X : \ X[l, K] R) 
and IC^o(n) = I(K : MO I XR). The use of the notation is equivalent to conditioning on the 
event Xk = B, i.e., imposing the distribution iJ,o, and helps us present our arguments more cleanly. 
Note also that under the distribution /xq, we write Bob's input as the prefix 

Since the value of the Augmented Index function /„ is a constant on /jLq, there is no a priori 
reason for the information cost of any party in a protocol to be large. However, we additionally 
require the protocol to be correct with non-trivial probability on the uniform distribution, under 
which there is equal chance of the function being or L If the information cost (under /io) of the two 
parties is sufficiently low, wc show that neither party can determine with high enough confidence 
what the function value is. The intuition behind this is as follows. Suppose we restrict the inputs 
to jiQ. If Bob's input K is changed, the random variables in Alice's possession, specifically the 
message transcript M° conditioned on her inputs, are not perturbed by much. This is because they 
give her little information about K. Similarly, if we flip one of the bits of Alice's input X outside 
of the prefix with Bob, the random variables in Bob's possession at the end of the protocol are 
not perturbed by much. Formally, these properties follow from the Average Encoding Theorem. 
Observe that if we simultaneously change Bob's index K to some L > K (while maintaining the 
condition that Xl = B), and flip the Lth bit of X, we switch from a 0-input of fn to a 1-input. The 
Cut-and-Paste Lemma ensures that by simultaneously changing the inputs with the two parties, the 
message transcript is perturbed by at most the sum of the amounts when the inputs are changed 
one at a time. This implies that the message transcript does not sufficiently help either party 
compute the function value. 

We formalize this intuition in the next theorem, which we state for even n. A similar result holds 
for odd n, and may be derived from the proof for the even case. 

Theorem 2.6 For any two-party randomized communication protocol H for the Augmented In- 
dex function fn with n even, that makes error at most £ G [0, 1/4) on the uniform distribution fj, 
over inputs, we have 



n 



1/2 

+ 



'^'"^""^ ■ 2.ic,^„(n) 



1/2 1 - 4£ 
> —= 



H(2e) 



n 



1/2 



where jiQ is the uniform, distribution over fn ^(0). In particular, for any e smaller than 1/4 by a 
constant, either lC'^^{li) G 0(n) orlC^^ili) G 1^(1). 
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Proof: Consider a protocol 11 as in the statement of the theorem. Let the inputs be given by 
random variables X, K, B, drawn from the distribution fj,, let d *== lC^^{Il)/n, and let c *== IC^^(n). 

To simplify the presentation, we suppress the public randomness R used in the protocol, i.e., 
assume that Alice and Bob only use private coins. This does not affect the generality of our proof; 
all the arguments below hold mutatis mutandis when the random variables are replaced by those 
conditioned on a specific value r for the public random coins R, and the parameters (d, c, e) are 
replaced by the corresponding quantities {dr,Cr,er)- Averaging the final inequality over R and 
applying the Jensen Inequality gives us the claimed bound, as the inequality is of the same form 
as in the statement of the theorem. 

Let M be the entire message transcript of the protocol. Without loss of generality, we assume that 
Bob computes the output of the protocol. If Alice computes the output, we include an additional 
message from her to Bob consisting of the output. This only marginally increases the information 
revealed by the Alice. Indeed, if the single bit output of the protocol is under the distribution /ig, 
11(0") < H(2e), as the protocol produces the correct output with probability at least 1 — 2e on the 
distribution /xq- Therefore, 

1{X : M°0'^\X[l,K]) = 1{X : M^\X[1,K]) +1{X : 0^\M^X[l,K]) 

< dn + H(0°) , 

and 1{K : M^0^ \ X) = 1{K : M^ \ X). Henceforth, we assume that the output of the protocol 11 is 
computed by Bob, and its information costs are bounded as IC^jj(n) < din with di = d + H(2e)/n, 
and IC^„(n) <c. 

We show below that the random variables M^X[1, K] with Bob are close in distribution to M^X[1, K— 
1] Xk, where denotes the transcript M conditioned on the function value being 1, i.e., when B = 
Xk. 

I.emmai2.7 \\M^X[l,K]- M^X[l,K -1]Xk\\ < 1 + + 4^2^, ?«/iere k = 

Since the protocol 11 identifies the two distributions, M'^X[1, K] and M^X[1,K — 1] Xki with av- 
erage error e, we have ||M°X[1, K] - M^X[1, K -1]Xk\\ > 2(1 - 2e). The theorem follows. ■ 



We now prove the heart of the theorem, i.e., that the message transcript for the and 1 inputs are 
close to each other in distribution. 

Proof of Lemma 12. 7t When we wish to explicitly write the transcript M as a function of the 
inputs to Alice and Bob, say x and x[l,k — l],b respectively, we write it as M{x; x[l, k — l],b). 
U b = Xk, we write Bob's input as x[l, k]. 

For any x € {0, 1}" and i S [n], let x^''^ denote the string that equals x in all coordinates except at 
the ith. Note that = M{X;X[1,K - 1],Xk) has the same distribution as M(X('^); K]), 
since X and X(^) are identically distributed. Thus, our goal is to bound 

M{X;X[1,K])X[1,K] - M{X^^'^ ; X[l, K]) X[l, K] 

For reasons that become apparent as we develop our proof, we bound the above quantity when K 
is larger than n/2. Let L be uniformly and independently distributed in [n] — [n/2]. Then 



M{X;X[l,K])X[l,K] - M{X'^^'^ ; X[l, K]) X[l, K] 



< 1 + 



M{X;X[1,L])X[1,L] - Mix'^^^ ; X[l, L]) X[l, L] 



(2.1) 
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So it suffices to bound the RHS above. 

Recall that our goal is to show that, on average, changing from a 0-input to a 1-input does not 
perturb the message transcript by much. For this, we begin by showing that changing Alice's input 
alone, or similarly, Bob's input alone, has this kind of effect. If the information cost of Bob is 
small, the message transcript does not carry much information about K when the inputs are drawn 
from fiQ. From this, we deduce that the transcript is (on average) nearly the same for different 
inputs to Bob. 

Let J be uniformly and independently distributed in [n/2], and let L be as defined above. We 
compare the transcript when Bob's input index is J to when it is L. 

Lemma 2.8 t){M{x ; x[1, j]) , M{x; x[l,l])f < 8k c. 

We defer the proof to later in this section. 

When changing Alice's input, we would like to ensure that the prefix held by Bob does not change. 
It is for this reason that we restrict our attention to Bob's inputs with index K £ [n/2], and change 
Alice's input by flipping the Lth bit, with L £ [n] — [n/2]. If the information cost of Alice is small, 
does not carry much information about X, even given a prefix. Therefore, flipping a bit outside 
the prefix does not perturb the transcript by much. 

Lemma 2.9 We have 

^{.[i,lUl)^iX[i,LUL) fl(M(x[l, I] X[l + 1, n] ; x[l,j]) , M{x[l, l-l]xi X[l + 1, n] ; j]) f 
< 16k di . 



This is proven later in the section. 

We now conclude the proof of Lemma 12.71 Since Hellinger distance squared is jointly convex 
(Proposition 12. 2p . Lemma [2?8l gives us 

lE(xM,i,OM^[i,L],j,L) mi4hl]X[l + l,n];x[l,j]) , M{x[l,l] X[l + l,n] ; x[l,l])) 

< VSkc . (2.2) 

Along with the Triangle Inequality, and Lemma 12.91 this implies that 

^ixlifi,j,l)^{Xli,LlJ,L) t}(M(x[l, /] X[l + 1, n] ; x[l, I]) , M(x[l, l-l]xi X[l + 1, n] ; x[l,j])) 
< VSkc + \/l6Kdi . 

Using the Cut-and-Paste property of communication protocols (Proposition 12. 5| ). we conclude that 
simultaneously changing Bob's input from j] to I] and flipping the lth bit of x perturbs the 
transcript by no more than the individual changes. 

^{.li,iW)^{x[i,LUL) HM{x[l, I] X[l + 1, n] ; x[l,j]) , M(x[l, / - 1] X[l + 1, n] ; x[l, /])) 
= ^i4iA3,l)^(X[i,L],J,L) f)(M(x[l, I] X[l + 1, n] ; x[l, /]) , M(x[l, / - 1] x; X[l + 1, n] ; x[l,i])) 
< V8i^+ ^/iQKdi . (2.3) 
Combining Eq. (j2.2p and Eq. ()2.3p . and using the Triangle Inequality we get 

^ix[i,iU)^ix[i,L],L) HMix[l, l]X[l + 1, n] ; x[l, /]) , M(x[l, / - 1] x; X[l + 1, n] ; x[l, /])) 
< 4V2KC + A^yKdl . 
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Using Proposition 12.11 we translate this back to a bound on £i distance: 
M{X ; X[l, L]) X[1,L] - M{X^^^ ; X[l, L]) X[l, L] 

< ^ix[i,lUl)^{X[i,LU,L) \\M(x[l, I] X[l + 1, n] ; x[l, I]) - M{x[l, l-l]xi X[l + 1, n] ; x[l, 

< 16^/KC + 8^/2^^dl . 

The lemma follows by combining this with Eq. (j2.ip . ■ 

We return to the deferred proofs. 

Proof of Lemma 12.8b Consider the random variable M jointly distributed with X, K which is 
implicitly defined by the equation KXM = K (XM^), where the latter is the product of the two 
distributions K, and the marg inal XM°. Then, 

Lemma 2.10 '^(x,k)i-{x.K) i)(^M{x; x[l,k]) , M(x)^ < kc, where k = 

Proof: From the Average Encoding Theorem, Proposition 12.41 we have that for every x G {0, l}", 

Ek^K h[M{x; x[l,k]) , M{x)y < k1{K : \ X = x) , 
which implies the lemma: 

E^,,k)^^x,K)i){M{x;x[l,k]),M{x)y < k1{K : M'' \ X) . 

■ 

An immediate consequence of the above lemma is that 

^{x,j)^(x,j)^{M{x;x[l,j]),M{x)y < 2kc , and 
%,/)<-{^,L) t)[M{x;x[l,l]) , M{x)y < 2kc . 

By the Triangle Inequality, for any j G ['^/2], I £ [n] — [n/2], and x G {0, 1}", 

t)(M(x; x[l,j]) , M{x;x[l,l])f 

< (f)(M(a;; x[l,j]) , M(x)) +f)(M(x; x[l,Z]) , M(x)))' 

< 2[)(^M{x; x[l,j]) , M{x)y + 2l)(^M{x; x[l,l]) , M{x)y . 

Taking expectation over X, J, L, we get the claimed bound. ■ 

Proof of Lemma 12. 9t We have 

1{X : M{X ; X[1,J])\X[1,J]) < 2 I(X : M°|X[1, K]) < 2din . (2.4) 
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Fix a sample point {x[l,j],j), with j € [n/2]. By the Chain Rule (Proposition 12. 3( ). 
+ l,n] : Mix[l,j]X[j + l,n] ; 

n 

= Yl ^Xl-M{x[l,j]X[j + l,n];x[l,j])\X[j + l,l-l]) 
l=j+i 

n 

> Yl KXr.M{x[l,j]X[j + l,n];x[l,j])\X[j + l,l-l]) . (2.5) 

l=n/2+l 

(2.6) 

Moreover, by the Average Encoding Theorem (Proposition 12. 4p and the Triangle Inequality, for any 
given with I £ [n] — [n/2], 

l){M{x[l,l-l]xiX[l + l,n];x[l,j]) , M{x[l,l - l]xi X[l + l,n]; x[l, 
< 4Kl{Xr.M{x[l,l-l]XiX[l + l,n];x[l,j])) . (2.7) 

Combining Eqs. (ITI]) . ([23]) . and ^1}, we get 

%.[i,/]j,OM^[i,L],J,L) f)(M(x[l,/]X[/ + l,n]; x[l,i]) , M{x[l,l - l]xi X[l + l,n]; x[l,j])f 

< 4k IE(,[i,i_i]j-i)^(x[i,L-i],j,L)I(^i : M{x[l,l- l]Xi,X[l + l,n] ; x[l,i])) 

= 4kE^.,[,^j^j^i)^^x[i,j],j,l)KXi : M(x[l,i]X[i + l,n] ; x[l,j])\X[j + 1,1- 1]) 

< — I(X : M(X; J]) J]) < 16k di , 
n 

as claimed. ■ 



3 The connection with streaming algorithms 

Streaming algorithms are algorithms of a simple form, intended to process massive problem in- 
stances rapidly, ideally using space that is of smaller order than the size of the input. A pass on an 
input X G S", where S is some alphabet, means that x is given as an input stream xi,X2, ■ ■ ■ ,Xn, 
which arrives sequentially, i.e., letter by letter in this order. We refer the reader to the text [25j for 
a more thorough introduction to streaming algorithms. 

Definition 3.1 (Streaming algorithm) Fix an alphabet S. A T-pass streaming algorithm A 
with space s{n) and time t{n) is an algorithm such that for every input stream x £ S".- 

1. A performs T sequential passes on x; 

2. A maintains a memory space of size s{n) bits while reading x; 

3. A has running time at most t{n) per letter xi; 

4- A has pre-processing and post-processing time at most t{n). 

We say that A is bidirectional if it is allowed to access to the input in the reverse order, after 
reaching the end of the input. Then the parameter T is the total number of passes in either direction. 
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Recall that in the Augmented Index problem, one party, Alice, has an n-bit string x, and the 
other party, Bob, has an integer S [n], the prefix x[l, fc — 1] of x, and a bit 6 G {0, 1}. Their goal 
is to compute the function fn{x, {k,x[l,k — = © 6, i.e., to determine whether b = Xk or 

not, by engaging in a two-party communication protocol. 

The relationship between streaming algorithms for Dyck(2) and protocols for /„ is captured by a 
reduction due to Magniez, Mathieu, and Nayak [23]. The reduction was originally described only for 
one-pass streaming algorithms, but extends immediately to unidirectional multi-pass algorithms. 
For completeness, we sketch a proof of this theorem highlighting the differences from the one-pass 
case. 

Theorem 3.1 (Magniez, Mathieu, and Nayak) Any randomized streaming algorithm for 'Dyck{2) 
with T passes in the same direction that uses space s for instances of length 4n^, and has worst- 
case two-sided error 6 yields a two-party communication protocol H for the Augmented Index 
function fn that makes error at most S on the uniform distribution fx over its inputs, and has in- 
formation costs IC^g(n) < sT for Alice and 10^^(11) < sT/n for Bob, with respect to the uniform 
distribution /zq over f~^{0). 

Proof: We sketch a proof of the theorem, highlighting the sole modification we need, namely in 
the definition of information cost. We refer the reader to Ref. [23] for the details. 

We rely on the same set of hard instances of Dyck(2), which correspond to strings of length 
between 2n^ and 4n^. These are padded with well- formed expressions so that the length of all in- 
stances is exactly 4n^ . Each hard instance corresponds to an instance of a 2n-player communication 
protocol for AsCENSiON(n), which is the logical OR of n independent instances of the two-player 
Augmented Index function /„. The players are denoted by Aj,Bj, i G [n]. A T-pass unidirec- 
tional streaming algorithm for Dyck(2) that uses space s results in a communication protocol 11 
for AsCENSlON(n) with T sequential iterations of messages in the order 

Ai ^ Bi ^ As ^ B2 ^ • • • A„ ^ B„ ^ 

Each message in this protocol is of length at most s, and the protocol makes the same worst-case 
error 5 as the streaming algorithm. 

Let Mb„j, j G [T], denote the messages sent by B„ to A„ in the T iterations. The protocol 11 
for AsCENSiON(n) gives rise to a protocol for a single instance of /„ through a direct sum property 
of its "internal information cost". Let hq be the uniform distribution over the subset of ({0, 1}" x 
[n] X {0,1}) on which the function fn is 0. Let {X,k,c) = [X\k\c'')"^^ be n instances of /„, 
distributed according to /Xg. Let R denote the public random bits in the protocol 11 arising from 
the randomness used by the streaming algorithm. The (internal) information cost of 11 is defined 
as: 

IC^n(n) = l{k,c:MB„,i---MB,,,T\XR) . 

This is the natural and straightforward extension of the measure used in the one-pass case, which 
concentrates on Mb„,i, the single message sent by B„. Note that IC^jj(n) < Ts, as each mes- 
sage Mq^j is of length at most s. 

The protocol 11 may be adapted to n different protocols 11-, i G [n], for /„, by precisely the same 
method of embedding an instance {X, K, B) of /„ into one of AsCENSiON(n), as described in Ref. [23l 
Section 4.3]. The 2n players in 11 are simulated by two players, Alice and Bob, as before: Alice 
simulates Ai, Bi, A2, B2, . . . , Aj, sends a message to Bob, who simulates Bj, Aj+i, Bj+i, . . . , A„, B„, 
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sends a message to Alice, who simulates A„, A,„_i, . . . , Ai, and they repeat this in the same order 
a total of T times. There are 2T messages in this protocol starting with Alice, she uses only 
public randomness, whereas Bob may use private randomness, and the protocol makes the same 
distributional error (at most 5) on the uniform distribution over its inputs as H does. The (internal) 
information cost of 11^ is measured as 

IC^„(n^) = l{k\ e : Mb„,i • • • Mb„,t I x'Rl , 

where i?* is the public randomness in 11^. This is the mutual information of all the messages 
sent by Bob with his input, given Alice's input, under the uniform distribution over the Os of the 
function /„. 

The superadditivity of mutual information gives us the direct sum result 

n 

ic^n(n) = 5^ic^o(nD, 

1=1 

as in Ref. [23' Lemma 3]. Therefore at least one protocol for /„ from (HQ, call it 11', has in- 
ternal information cost at most Ts/n. Note that we may replace Bob's messages by the entire 
message transcript in 11' in this information cost without changing its value, as Alice's messages 
are independent of K, given X, Bob's messages, and the public randomness. Moreover, the to- 
tal length of the messages sent by Alice is at most sT, so the mutual information of X with the 
entire message transcript in H', even given Bob's input and the public randomness, is at most sT. ■ 

The information cost trade-off in Theorem 12.61 implies that any streaming algorithm that makes a 
"small" number of passes over the input requires a "large" amount of space. 

Corollary 3.2 Any randomized (unidirectional) T-pass streaming algorithm for Dyck(2) that has 
worst-case two-sided error 5 < 1/4 uses space at least 

y/N 1 
~ ^ 6 + 4\/2 

on instances of length N . 

4 Quantum information cost of Augmented Index 

We now turn to quantum communication, and present the necessary background in Section 14. li In 
Section 14.21 we show how the notion of average encoding may be applied also to quantum protocols 
for Augmented Index. The analysis of quantum protocols for Augmented iNDEXinvolves a 
number of additional additional subtleties, which are also described along the way. 

4.1 Quantum information theory and communication 

We continue the use of capital letters to denote random variables. We see these as special cases 
of quantum states, which are trace one positive semi-definite matrices. Random variables may be 
viewed as quantum states that are diagonal in a canonical basis. Quantum states are also denoted 
by capital letters P, Q, etc. 



1 - Ae 2yH(2e) 
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The trace distance ||^ — -BH^^ between two quantum states A, B over the same Hilbert space is the 
metric induced by the trace norm ||M||^j. = Tiy/Wu. The Bures distance \){A , B) between the 
states is defined as 



1 - 


^/A^/B 








tr- 



1/2 



For pure states \'4'i) ■:\il>2) we use fidV'i) i 1^2)) as shorthand for f)(|^i)(?/;i| , |^2)(v^2|)- Bures dis- 
tance is related to to £i distance in the following manner. 

Proposition 4.1 Let P,Q be quantum states over the same Hilbert space. Then 

In the following, let (px), {%) be distributions over the finite sample spaces 5,5', respectively. 

The square of the Bures distance is convex in the following sense. Suppose two quantum states -P, Q 
are block diagonal in the same basis \x) for the space C*^, and the blocks corresponding to x in P, Q 
have the same trace px- 

Proposition 4.2 Let Px,Qx be quantum states over the same finite Hilbert space for each x G 5. 
Let P = Y,x<^sP^\^)iA ® Px, and Q = Y.x(iSPx\x){x\ ® Qx- Then 

\){P, Qf = Y.P-^^^- ' ^-)' • 

The Local Transition Theorem due to Uhlmann [27] helps us find purifications of quantum states 
that achieve the Bures distance between them. 

Proposition 4.3 (Local Transition Theorem) Let and {ipi) be two pure states in a tensor 
product Hi <^ H2 of Hilbert spaces. Then there exists a unitary operator U on Hi such that 

{)(([/ ^ iV^i) , |V^2» = fllTr^JViXV'il , Tr^ilV'2>(^2|) . 

We rely on a number of standard results from quantum information theory in this work. For a 
comprehensive introduction to the subject, we refer the reader to a text such as |27j . 

Let S{P) denote the von Neumann entropy of the quantum state P, and 1{P : Q) denote the mutual 
information between the two parts of a joint quantum state PQ. 

For a joint quantum state XQ = Y^xgS Px\^){^\ ® Qx we define the conditional von Neumann 
entropy as S{Q \ X) = '^x&sPx S{Qx)- Similarly, for a joint state XPQ = '^xes Px\^){^\ PxQx, 
where PxQx is a joint state for each x G 5, we define the conditional mutual information as 

1{P:Q\X) = S(P|X) + S(Q|X)-S(PQ|X) . 
The chain rule for mutual information states: 

Proposition 4.4 (Chain rule) Let XYQ = J2xes y&S' PxQy\^y){^y\ Qxy be a joint quantum 
state. Then 

I{XY:Q) = 1{X ■.Q)+1(Y -.QIX) . 
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The Average Encoding Theorem \20\ I14j also holds for quantum states. (In fact, it was first 
formulated in the context of quantum communication.) 

Proposition 4.5 (Average encoding theorem) Let XQ = •X'Qx a joint quan- 

tum state. Then, 

E,^xf)(Qx, Q? < Kl{X:Q) , 

where k is the constant 

We briefly describe the model of two-party quantum communication that we study. Following 
the model introduced by Yao [29], two "players", Alice and Bob, hold some number of qubits, 
which initially factor into a tensor product A (8> ^a,o <^ ^B,o 'S' B of Hilbert spaces. The qubits 
corresponding to ^(^T^a.o are in Alice's possession, and those in Tisfi ®B are in Bob's possession. 
We restrict ourselves to protocols with classical inputs and outputs. When the game starts, Alice 
holds a classical input represented by a bit string x and similarly Bob holds y. In other words, 
the qubits in space A are initialized to and those in B are initialized to \y). The qubits in the 
spaces T-ifKfl ® ^B,o are intended to be the workspace of the two parties, and are initialized to a 
possibly entangled state |<I>) that is independent of the inputs Alice and Bob have. The initial joint 
state is thus \x) ®\^) ®\y). 

The protocol consists of some number t > 1 of rounds of message exchange, in which the two 
players "play" alternately (any party may be the first to play). Suppose it is Alice's turn to play 
in round i, with i > 1. Suppose the workspace of the two players just before the round factors 
as ^A,i-i ® ^B,i-i- Alice applies a unitary operator Vi^x to the qubits in Note that her 

unitary depends on her input x and the round. We will have occasion to consider runs of the protocol 
on superpositions of inputs. In this case, we think of Alice as applying the unitary \ x){x\ ® Vi^x 
to the qubits in the space A^T-L^^i-i. Then, Alice sends some of her qubits to Bob. Formally, the 
space ^A,i-i factors as Hp^^i® Aii, where Aii denotes the message space, and ?^B,i = Mi®T~LB^i-i. 
As a result. Bob may now apply a unitary operation to the qubits previously in Alice's control. 

At the end of the t rounds of message exchange, the player to receive the last message, say Bob, 
measures the qubits in his possession (those in ?^B,t) according to a general measurement that may 
depend on his input y. The measurement outcome is considered to be the output of the protocol. 

We emphasize that the input qubits in the protocol are read only, and that there are no intermediate 
measurements. A more general protocol may be transformed into this form by appealing to standard 
techniques. 

4.2 The quantum information cost trade-off 

In this section, we derive an analogue of the information trade-off result established in Section 12.21 
for quantum communication protocols for Augmented Index. 

We first define the notion of quantum information cost for the Augmented Index function /„. As 
in Section [2^ let (X, B) be random variables distributed according to /i, the uniform distribution 
over {0, 1}" X [n] x {0, 1}. Let /^o denote the distribution conditioned upon Xk = B, i.e., when the 
inputs are chosen uniformly from the set of Os of fn- We are interested in the quantum information 
cost of a protocol 11 for Augmented Index under the distribution /^q, for the two parties. 

A significant difference between the classical and quantum information costs arises because the 
no-cloning principle [27] prevents the two parties from keeping a copy of the messages. A natural 
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notion of a transcript that encapsulates the history of a quantum protocol is instead the sequence 
of the joint states after each message exchange. Correspondingly, the notion of information cost 
is also different from the one in the classical case. A second point of departure from the classical 
case is that we consider the information contained about a superposition of inputs corresponding 
to the distribution of interest. This information is in general more than the information contained 
about a distribution over inputs, and the resulting notion seems to be necessary for the proof of 
the information cost trade-off we present. The final, technical point of difference comes from the 
manner in which the input is distributed among the two parties. Since Alice and Bob share X[l, K], 
when the input registers are initialized with superpositions corresponding to /io, the two parties 
already begin with some information about the each other's input. Unlike in the classical case, this 
enables Alice to get information about the index K. The effect of sharing the prefix X[l,Er] is 
identical to that of measuring the first K qubits of Alice's superposition in the computational basis. 
This results in states of varying amount of von Neumann entropy for different indices, which leaks 
information about the index K. To quantify the information leaked by the protocol, we therefore 
imagine that there is a single quantum register that carries the superposition corresponding to X, 
and that Bob has read-only access to the relevant portion of this register. The information cost is 
then measured with respect to this register. 

As explained above, we adopt the following convention with respect to the inputs for Augmented 
Index in the rest of this section. We imagine that Alice is given the input x, and Bob is given k, b, 
and access to the prefix x[l,k — 1], rather than a copy of these bits. When we restrict to the 
distribution //q, we assume he has read-only access to This means that the local unitary 

operations used by Bob during the protocol are controlled by the register holding this prefix. 

We suppose that there are a total of t messages, beginning with Alice and alternating with Bob. 
This is solely to eliminate awkwardness in defining and referring to quantum information cost as we 
do below, and may be removed without affecting the results. Alternatively, if Bob starts, we may 
modify the protocol so that Alice sends a single qubit in a fixed state, say |0), at the beginning. 
This does not affect the information cost, but increases the number of messages by one. 

Let PiQi denote the joint state of Alice and Bob's workspace in a protocol 11 for Augmented 
Index when we start with a uniform superposition X over string x G {0, 1}" and the random 
inputs K, B with Bob (this corresponds to distribution /i), and let P-^Q^ denote the analogous joint 
state corresponding to iiq, immediately after the ith. message is sent. The quantum information 
cost of n for Alice and Bob with respect to /xq is then defined as 

QIC^„(n) = Yl , and 

odd ie[t] 

QlCliU) = J2 ^^K:XPf) . 

even i€[t] 

Note that there is an asymmetry in the manner we quantify quantum information cost. In Alice's 
cost, we measure the information about a uniformly random string X in Bob's quantum state, 
given the prefix to which he has access. In Bob's cost, we measure the information about a random 
index K in the joint state of strings x in superposition and Alice's workspace qubits. Although 
we could also consider superpositions over x in Alice's cost and over k in Bob's cost, we chose the 
above notions as they give us the strongest result. The information quantities with superpositions 
are always bounded from below by the ones with random variables, due to the monotonicity of 
mutual information under quantum operations. 

The intuition behind the lower bound on quantum information cost is the same as that in the 
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classical case. Namely, starting from an input pair on which the function evaluates to 0, if the 
information cost of any one party is low and we carefully change her input, the other party's share 
of the state does not change much. Assume for simplicity that Alice produces the output of the 
protocol. We show that even when we simultaneously change both parts of the input, resulting 
in a 1-input of the function, the perturbation to Alice's final state is also correspondingly small. 
This implies that the two information costs cannot be small simultaneously. In the final piece of 
the argument above, the Local Transition Theorem and a hybrid argument take the place of the 
Cut-and-Paste Lemma. Unlike the latter, these are applied on a message-by-message basis, a la 
Jain, Radhakrishnan, and Sen [14] , and leads to a dependence of the information cost trade-off on 
the number of messages in the protocol. 

The next theorem executes this argument for even n. A similar result also holds for odd n, and 
may be inferred from the proof for the even case. 



Theorem 4.6 Let H be any quantum two-party communication protocol for the Augmented In- 
dex function fn with n even, Alice starting and alternating with Bob for a total of t > 1 messages. 
If n makes error at most e £ [0,1 /4] on the uniform distribution fi over inputs, then 



A 

n 



QiC(n) 



1/2 



+ [2.Qic^„(n) 

where fiQ is the uniform distribution over fn^{0). 



1/2 



> 



l-4e 



Wnt 



Proof: Consider a protocol 11 as in the statement of the theorem. Let the inputs be given 
by random variables X,K,B, drawn from the distribution fi, let d '= QIC^g(n)/n, and let c *== 

Let XPiQiKB be the joint state of the registers used in the protocol, when the inputs are initialized 
with a uniform superposition X over x G {0,1}" and random variables K,B, immediately after 
the ith message in the protocol. Let di = ^ 1{X : gO | X[l, K]) for odd i e [t], and a = 1{K : XP^) 
for even i e [t]. So d = ^^^d i&[t] and c = J^cvcn ie[t] ^i- 

We prove the theorem assuming that Alice computes the output of the protocol, i.e., t is even. The 
proof when Bob computes the output is similar; we point out the main differences along the way. 
If t is even, we show that the state XP^ is close in trace distance to the state XP^, where XP^ 
denotes the reduced state XPt conditioned on the function value being 1, i.e., when B = Xk- 
(Note that X is the classical random variable corresponding to the superposition X.) 



Lemma 4.7 -XP/II < 1 + A^/kI 2Vd + V^c 



where k = ^ . 



If t is odd, i.e.. Bob computes the output of the protocol, we show the same bound on 

\\Q',X[1,K]-QIX[1,K-1]Xk\1^ . 
Since the protocol identifies the two states XPj^ and XP^, with average error e, we have 

\\XPj^ -XP^W > 2(1 -2e) . 

The theorem follows. 
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We now prove the core of the theorem, i.e., that if Ahce computes the output, her final state for 
the and 1 inputs are close to each other in distribution. 

Proof of Lemma 14. 7t When we wish to explicitly write a state, say Pj, as a function of the 
inputs to Alice and Bob, say x and x[l,k — l],b respectively, we write it as Pi{x;x[l,k — l],b). 
If 6 = Xk, we write Bob's input as x[l, k]. 

As before, for any x S {0,1}" and i G [n], we let x^^^ denote the string that equals x in all 
coordinates except at the ith. Note that = Pt{X; X[l, K — 1],Xk) is the same mixed state 
as Pt{X^^'l; X[l, K]), since X and X(^) are identically distributed. Thus, our goal is to bound 

XPt(X;X[l,i^])-XWPi(X(^);X[l,i^]) 



For reasons similar to those the classical case and new ones arising from our proof below, we 
consider the trace distance between the first term above with K E [n'/2] and the second term 
with K G [n] — [n/2]. (Recall that in the classical case, we restricted ourselves to G [n] — [n/2] 
in both terms.) Let J be uniformly and independently distributed in [n/2], and let L be uniformly 
and independently distributed in [n] — [n/2]. Then 

XPt{X; X[l, K]) - xWPt(xW; X[l, K]) 
1 



< 



' + 2 

1 

1+2 



XPt{X-X[l, J\) - X(^)Pt(X(^); X[l, L]) 
X(^)Pi(X(^); X[l, J]) - X(^)Pj(X(^); X[l, L]) 
So it suffices to bound the RHS above. If t is odd, we instead bound 



(4.1) 



Qt{X-X[l,K])X[l,K\-Qt{X^'')-X[l,K])X[l,K] 
1 



< 



1+2 



Q,(X: X[l, L])X\l,L] - Q,(A-<«i X[l, L])X\l,L] 



This expression is similar to the one we had in the classical case: we focus on the case KG [n] — [n/2\ 
alone. 

For every j G [n/2], / G [n] — [n/2] and z G {0,1}', we consider four runs of the protocol 11. The 
inputs to Alice and Bob in the four runs are summarized in the table below. Only the first / bits 
of Alice's input are specified. In all four runs, the last (n — I) input bits of Alice are initialized to a 
uniform superposition over all (n — /)-bit strings. The final column gives the notation for the (pure) 
state corresponding to the registers X[/ + 1, n] PiQi, which constitute the last (n — /) inputs bits of 
Alice, her workspace, and that of Bob, immediately after the ith message has been sent, i G [t\. 



Run 


Alice's input x{l, I] 


Bob's input A;, x[l. A; — 1], 6 


State 


00 


z 


3,z[l,j - l],Zj 


\<Pi{z,j)) 


01 


z 


l,z[l,l - l],Zl 




10 








11 




l,z[l,l - l],Zl 





The "Run" column indicates whether Alice's /th bit has been switched, and whether we have 
switched j to I. Note that in the first three runs of the protocol, we expect the output to be 0, and 
in the last run, we expect it to be 1. 
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We compare the intermediate protocol states in the above four runs, when we flip the Ith input bit 
of Ahce, and when we switch Bob's input from j to / (along with the corresponding prefix). We 
show that the switch results in a perturbation to reduced state of the other party that is related 
to the information contained about the bit or the index (as in the classical case). To quantify this 
perturbation, define 

hi{j,l,z) = i)(Qi{zX[l + l,n];z[l,j]) , Qi{z^'^X[l + l,n];z[l,j])') , 

for every odd i e [t]. Define 

hi{j,l,z) = f)(l[/ + l,n]Pi(zX[^ + l,n];z[l,j]) , X[/ + l,n]Pi(zl[i + l,n];z[l,l])) , 

for every even i e [t]. In the above states, Pj is entangled with the qubits holding X, and is written 
as a function of X[l + l,n] to emphasize this. 

The number of qubits Alice and Bob have during the protocol changes with every message. To 
maintain simplicity of notation, we denote the identity operator in any round on the register 
holding + n] and Alice's workspace qubits by Ia and the identity operator on Bob's workspace 
qubits by Ib- 

We begin by showing that changing Bob's input alone from j to I while keeping Alice's input fixed 
at zX[l + l,n], does not perturb Alice's reduced state in any round of communication by much, 
provided the corresponding information cost of Bob is small. By the Local Transition Theorem, we 
then see that Bob may apply a unitary operation to his qubits alone to bring the protocol states 
close to each other. 

Lemma 4.8 For every even i £ [t], there is a unitary operator Ui that depends upon j,l,z, acts on 
Bob's workspace qubits alone (i.e., on the register holding state Qi), and is such that 

^{{lp,®Ui)\(l)i{z,j)) ,\^i{z,l))) = hi{j,l,z) . 

Moreover, 

^(j',l',z')<-{J,L,X[l,L]) hi{j',l',z') < VSKCi . 

The proof is presented later in this section. 

Next, we show that if the information cost of Alice is small, Bob's state does not carry much 
information about X, even given a prefix. Therefore, hipping a bit outside the prefix does not 
perturb Bob's state by much, and there is a unitary operation on Alice's qubits which brings the 
joint states close to each other. 

Lemma 4.9 For every odd i G [t], there is a unitary operator Ui that depends upon j, I, z, acts on 
the qubits holding X[l + l,n] and Alice's workspace qubits (the register holding state Pi), and is 
such that 

i){{Ui Ib) J)) ,\M^'-'\j))) = hiij,l,z) . 
Moreover, 

^ir,l',z')^{J,L,X[l,L]) hi{j',l',z') < \\fKdi . 
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This is proven later in the section. 

There is no quantum counterpart to the Cut-and-Paste lemma, so that unlike in the classical case, 
the above two lemmata are by themselves not sufficient to conclude the theorem. Instead, we 
combine these with a hybrid argument to show that switching from carefully chosen 0-inputs of 
Augmented Index to corresponding 1-inputs does not affect the final state by "much" . 

Lemma 4.10 Let be the unitary operators given by Lemmata \4-S\ and \4-(^ For every 

odd r £ [t], 



r-1 



For every even r £ [t], 



i=l 



r-1 



i=l 



This is proved later in this section. 

By the Triangle Inequality, the monotonicity of the trace distance under quantum operations, the 



relationship between trace and Bures distance (Proposition 14. Lemmata 14.101 ITH] and 

xWPi(X(^);X[l, J]) -X(^)Pt(X(^);X[l,L]) 



< E 



(j,l,z)^(J,L,X[l,L]) 



< E 



(j,l,z)*-(.J,L,X[l,L]) 



X[l + l,n] pM'^X[1 + l,n]; z[l,i]) - X[l + l,n] pM'^X[1 + 1, n]; z[l, /]) 
X[/ + 1, n] Pt{z^^X[l + 1, n]; z[l,i]) - X[l + 1, n] Pt{z^^ X[l + 1, n]; z[l, /]) 

< 2^/2E(,y,,)^(j,,.,x[i,L]) f)(^[/ + l,n]Pt(z(')l[/ + l,n];z[l,i]) , + 1, n] + 1, n]; z 

< 2^/2 E(,,;,,)^(j,i,x[i,i]) f)((lA®?7t)|<At(z«,j)) , mz^'\l)) 

t 

< 4\/2 ^{j,l,z)^(J,L,X[l,L]) X] ^»(-?'' ^' ^) 



i=l 



This concludes the proof of Lemma 14. 7[ 



odd i&[t] even i£[t] 



< 



2Vd + V2^ 



We turn to the deferred proofs. 

Proof of Lemma 14. 8t Note that + l,n] Pi{zX[l + 1, n]; z[l. A;]) for A: < Hs the reduced state 
of \(j)(z,k)) with Bob's workspace (i.e., the register holding state Qi) traced out. By the Local 
Transition Theorem, Proposition 14.31 there is a unitary operator Ui that depends upon j,l,z, acts 
on Bob's workspace qubits alone, and is such that 



hi{j,l,z) . 
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We show that this distance is bounded on average. Consider the quantum state XPi which is 
the reduced state of all quantum registers except Bob's workspace and his input K. We denote 
by XPi{X]X[l, k]) this state for a fixed index k, so that 

1 

XPi = -Y,XPi{X;X[l,k]) . 

k=l 

By the Average Encoding Theorem, Proposition 14. 5t 

Ek^Kt)[xn{X; X[l,k]) , XP^^ < KCi , 
where k = An immediate consequence is that 

Ej,^jt)(^XPiiX; X[l,j']) , XPiY < 2KCi , and 
f}(xP,(X; 1 [!,/']), XPi)' < 2KC^ . 
By the Triangle Inequality, for any j' G [?T'/2], /' G [n] — [n/2], 
\){xPi{X-X[l,j']) , XP(X; 

< (i)(xP,{X-X[l,j']) , XP^+\)(xPi{X-X[l,l']) , XP,))' 

< 2f)(xP,(X; , XPi)' + 2{)(xPi(X; , XPi)' . 

Since Bures distance is monotonic under measurements, measuring the first I' qubits of X yields 

f) (X[l, /'] X[l' + 1, n] Pi{X[l, I'] X[l' + 1, n] ; , 

/'] X[l' + 1, n] Pi{X[l, I'] X[l' + 1, n] ; X[l, /'])) ' 

< 2f)(xPi(X; X[l,/]) , XPi)' + 2f)(xPi(X; X[l,r]) , XPi)' . 

Moreover, by Proposition 14.21 the left hand side above is equal to 

^z'^X[i,V] f)(x[/' + l,n]Pi(z'X[r + l,n]; , X[/' + 1, n] Pi(z'X[Z' + 1, n] ; /'])) ' . 

Taking expectation over <— {J,L), and invoking the Jensen inequality, we get the claimed 

bound. ■ 

Proof of Lemma 14. 9t Note that Qi{zX[l + 1, n]; z[l, k]) fov k < I is the reduced state of \4>iz, k)) 
with the register holding X and Alice's workspace (the register holding state Pi) traced out. By the 
Local Transition Theorem, Proposition 14.31 there is a unitary operator Ui that depends upon j, ^, z, 
acts on the registers holding X[Z + 1, n] Pj alone, and is such that 

f)(([/.0lB)|0i(z,j)) , |0i(^^'\i))) = hi{j,l,z) . 

We have 

I(X:g°(X;X[l,J])|X[l,J]) < 2 1(X : g° | X[l, /C]) < 2din . (4.2) 
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Fix f G [n/2] and z" G {0, 1}-''. By the Chain Rule, Proposition [Ol 
I(X[/ + l,n] :Q,(z"X[j' + l,n]; z")) 

n 

= ^{Xv:Qi{z"X[j' + z")\X[3' + 1,1' 



l'=j'+i 



> Yl • Qii^"X[f + 1, n] ; z") I X[j' + 1,1'- 1]) . (4.3) 

F=n/2+l 

Moreover, by the Average Encoding Theorem (Proposition 14. 5p and the Triangle Inequality, for any 
given I' £ [n] - [n/2] and z' £ {0, 1}^', 

i)(^Qiiz'X[l' + l,n]; z'[l,j']) , Q,(z'('')X[/' + 1, n] ; j'])^ 
< 4kI{Xi> -.Qiiz'il,!' -l]Xi,X[l' + l,n]; z'[l,j'])) . (4.4) 

Combining Eqs. (jM]), (jM]), and (03]), we get 

IEa',r,.')M^,i,X[i,L]) i){Qi{z'X[l' + l,n]; z'[l,j']) , Q,(z'('')x[r + 1, n] ; /])) ' 

< 4KE(,v,;,,,,)^(j,i,^[i,i])I(X^ :Q,(z'[l,r-l]XrX[/' + l,n]; z^l,/^ 

= 4k IE(,-v.,,")^(j,i,x[i,J]) : Qiiz" X[j' + 1, n] ; z") | X[j' + 1,1'- 1]) 
8k 

< —l{X:Qi{X;X[l,J])\X[l,J]) < Wk di , 
n 

as claimed. ■ 

Proof of Lemma I4.10t We prove the lemma by induction over r € [t\. The base case is r = 1 . By 
the convention we have adopted, Alice sends the first message. Since the joint state immediately 
after the first message is independent of Bob's input, we have 

|0i(z,O) = and |0i(z«,/)) = |0i(z«,j)) , 

so along with Lemma 14.91 we get 

^{(Ui®l^)\<Pi{z,l)) , \<Pi{z^'\l)) ^ 
= \){{Ui®ls)\Mz,j)) ,\(l,i{z^'\j))) = hi{j,l,z) . 

We prove that the lemma holds for r, assuming that it holds for r — 1 G [t]. There are two cases: r 
is odd, or r is even. We conduct the argument in the second case, when r is even. The argument 
for r odd is similar, and is omitted. 

By Lemma 14.8^ we have 

[){{lA<S)Ur)\Mz,j)) , \Mz,l))) = K{j,l,z) , (4.5) 

and by Lemma 14.91 we have 

h({Ur-l®lB)\(t)r-l{z,j)) ,\(t)r-l{z^^\j))) = hr-l{j,l,z). 
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By the induction hypothesis, we also have 

r-2 

f)(([/,_l®lB)|0r-l(^,/)) , |0r-l(^^'\O)) < hr^i{j,l,z)+2Y,hi{j,l,z) . 

i=l 

Now 

\<pr{z,l)) = (Ia |0r-i(^,O) > and 

\M^^'\l)) = (Ia^V;,,[i,,])|(/>,_i(z«,0> , 

where ^,^[1,/] is the unitary operator that Bob apphes on his part of the state (i.e., on the register 
holding state Qr-i before sending the rth message. Note that V^r.^Ii,;] commutes with Ur-i, as they 
act on disjoint sets of qubits. Since the Bures distance is invariant unitary operators, we get 

{)((C/,„l®lB)|0r)(^,j) , |0r(^('\j))) = hr-i{j,l,z) , (4.6) 

and 

r-2 

i)(^{Ur-i0lB)\Mz,l)) , \Mz^'\l))) < hr-i{j,l,z) + 2j2h^{j,l,z) . (4.7) 

1=1 

Using Eqs. (|4.5p . (|4.6p . and (j4.7p . and observing that f/r-i and Uj- act on disjoint sets of qubits, 
we get 



< f,((lA®C/r)|0r(^^'\i)) , iUr-10l0Ur)\Mz,j))] 

+ f)((C/,_i0l®C/,)|0,(z,i)) , |0.(z('\O)) 

= hr^i{j,l,z) + t}(^{Ur-l^I^Ur)\MzJ)) , |<^r(^^'M))^ 

< hr-l{j,l,z) + t}{{Ur-10l^Ur)\Mz,j)) , {Ur-1 Ib) IM^, I))) 
+ f) (([/,_! 0lB)|0r(^,/)) , \M^^'\l))) 

< hr-lij,l,z)+hrij,l,z)+l)(^{Ur-l^lB)\Mz,l)) , \Mz^^\ I))) 

r-1 

< hrij,l,z) + 2^hi{j,l,z) 
1=1 



(The identity operators without a subscript in this derivation act on the space of the rth message. 
This completes the induction step. I 
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